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AN  EFFICIENT  DIRECT  SOLVER  FOR 
SEPARABLE  AND  NON-SEPARABLE 
ELLIPTIC  EQUATIONS 


1.  Introduction 

The  numerical  solution  of  elliptic  partial  differential  equations  is  frequently  required  for 
atmospheric  problems  These  equations  may  be  separable  or  non-separable  and  their  solutions 
are  subject  to  have  a variety  of  boundary  conditions.  If  we  write  the  elliptic  finite  difference 
form  of  the  differential  equation  as  A x = F.  where  A is  the  coefficient  matrix.  F.  the  forcing 
function  and  x the  required  solution,  then  A can  be  diagonally  dominant,  weakly  dominant  or 
even  non-diagonally  dominant.  Even  though  a number  of  iterative  and  direct  solvers  are  avail- 
able in  the  literature  for  solving  elliptic  equations,  no  one  method  is  efficient  for  all  types  of 
equations.  The  iterative  methods  such  as  SOR  (successive  over-relaxation).  ADI  (alternate 
direct  implicit)  and  hybrid  methods  such  as  optimized  block  implicit  relaxation  (Dietrich,  et  al.. 
1975)  are  very  efficient  for  diagonal!)  dominant  equations.  But  they  converge  very  slowly  or 
may  even  diverge  for  the  weakly  or  non-diagonally  dominant  equations 

The  direct  solvers  developed  by  Hockney  (1965).  Buneman  (1969)  and  Rosmond  and 
Faulkner  (1976)  are  ver>  powerful,  but  restricted  to  separable  equations.  For  non-separable 
equations  the  direct  solvers  of  Lindzen-Kuo  (1969)  and  Crout  (1941)  are  available  but  require 
large  amounts  of  computer  memory  and  thus  are  limited  to  small  arrays. 

The  error  vector  propagation  method  (EVP)  (Roche:  1971.  1977.  Hirota  et  al.:  1970)  is 
applicable  to  ant  type  of  elliptic  equation  and  requires  an  order  of  magnitude  less  memory  than 
Lindzen-Kuo  However,  as  shown  by  Meavaney  and  Leslie  (1972).  EVP  is  unstable  for  a large 
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number  of  grid  points.  In  this  paper  we  develop  a stabilized  error  vector  propagation  method 
(SEVP)  which  is  stable  for  any  number  of  grid  points  and  retains  most  advantages  of  EVP. 

In  Section  2 a review  of  EVP  upon  which  SEVP  is  based  is  given.  The  SEVP  is 
described  in  Section  3.  A comparison  of  the  computer  time  and  memory  requirements  for 
several  representative  direct  and  iterative  solvers  is  given  in  Section  4.  A summary  is  given  in 
Section  5. 


2.  Error  Vector  Propagation  Method  (EVP) 


A general  two  dimensional  elliptic  equation  in  finite  difference  form  can  be  written  as 


AX(i,j)  x(i  — 1 ,j)  + AY(i,j)  x(i,j  — 1 ) + BB(i,j ) x(i,j)  + 
CXi/.j)  x(i  4-  1 ,j)  + CY(i,j)  x(i,j  + 1 ) = F(i,j) 


(1) 


where  the  coefficients  .4A'.  A V.  BB.  CA'  and  C>  may  be  functions  of  both  / and./,  and  Fis  the 
forcing  function.  We  now  review  the  EVP  procedure  described  by  Roche  0971.  1977)  for 
solving  equation  (1)  over  a rectangular  region  shown  in  Figure  1 using  Diriciet  boundary  con- 
ditions. By  rearranging  the  terms,  equation  (1)  can  be  written  as 


x(i,j  + 1 ) « ( FU,j ) - AXU.j)  x(i  - 1 ,./)  - AY(i.j)  x (/,  j - 1 ) 
- BB(i.j)  a -(/,./)  - CX(i..i ) x (/  + 1 ,j))  ( CYUJ )) 


(2) 


It  is  clear  from  equation  (2)  that  if  we  know  the  solution  on  any  two  consecutive  rows 
then  we  can  march  in  j to  obtain  the  required  solution.  Since  the  solution  is  known  only  on 
the  boundaries  B j.  B- j.  By  and  BA.  the  EVP  method  requires  an  initial  guess  of  the  solution  on 
the  row  interior  to  B,  to  march  forward  in  j . Based  on  this  inital  guess  a particular  solution. 
xp(i.j)  is  obtained  which  satisfies  equation  (2)  and  all  the  boundary  conditions  except  along 
Bz.  If  we  assume  that  the  particular  solution  deviates  from  the  real  solution,  x(i.j).  by 
xH(i,j),  then 
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x (i,j)  “ xp(i,j)  +xH(i,j) 


(3) 

substituting  equation  (3)  into  equation  (2)  and  noting  that  the  particular  solution  satisifies 
equation  (2)  and  boundary  conditions  at  B],By,BA,  we  obtain  the  following  homogeneous 
equation 


xHU,j  + 1 ) = - (AXUJ)  xH(i  - 1 ,j)  + AY(iJ)  xH(i.j  - 1 ) + 


BB(i.j)  xHU,j)  + CXU.j)  xH(i  + 1 ,j))(CYU,j))  _1 


(4) 


with  zero  boundary  conditions  along  B] . B-,  BA.  Along  B:  we  find  from  equation  (3) 

•vw(/.A')  ■*=  x(i,N)  - xp(,.N) 


(5) 


We  can  relate  the  homogeneous  solution  on  any  two  rows  by  using  (M  —2  ) independent 
vectors,  where  M represents  total  number  of  grid  points  in  / including  boundary  points.  Since 
the  solution  along  B:  (given  by  equation  (5))  is  known,  we  will  relate  it  to  the  solution  on  the 
row  just  interior  to  the  boundary  B]  . In  other  words,  if  the  vectors  and  «2  (Figure  1) 
represent  the  homogeneous  solution  on  the  second  and  last  rows  respectively,  then  we  can  find 
an  influence  matrix  R]  , such  that 


*2  “ *1  Rl.i 

(6) 

In  order  to  obtain  R,  j.  we  start  with  the  Aih  unit  vector  along  the  second  row  and  march 
equation  (4)  in  with  zero  boundary  conditions  along  Bx  B:,.  The  Alh  row  elements  of  the 
B\  ] matrix  are  the  elements  of  Eq.  (4)  along  B ; boundary.  By  varying  the  values  of  A from  2 
to  (M  -1 ) we  complete  /?,  , 


From  equation  (6)  and  the  error  vector  (given  by  equation  (5)1  we  compute  the  homo- 
geneous solution  on  the  second  rou  Using  these  values  on  the  second  row  and  zero  boundary 
conditions  on  B\ , By  and  B4  we  march  equation  (4)  in  j to  obtain  the  homogeneous  solution 
throughout  the  region.  Then  by  superposition  of  the  particular  and  homogeneous  solutions  we 
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obtain  the  complete  solution.  Due  to  round  off  errors  in  computing  the  inverse  of  the  matrix 
R\  ],  this  solution  does  not  satisfy  the  boundary  condition  along  B2  exactly.  The  accuracy  of 
the  solution  is  improved  by  recomputing  the  ho-  '•eneous  part  of  the  sol  ion  a few  times. 
Normally,  about  six  iterations  are  required  for  20  grid  points  in  marching  direction  to  obtain  an 
error  of  10  14  on  a computer  with  56  bit  word  precision.  The  number  of  iterations  required 

increases  as  the  number  of  grid  points  in  the  marching  direction  is  increased  up  to  about  25; 
beyond  25  the  method  becomes  unstable. 

The  influence  matrix.  R,  ,.  depends  only  upon  the  coefficients  of  the  elliptic  equation 
and  therefore  can  be  computed  without  knowledge  of  the  forcing  function  and  the  boundary 
conditions.  This  step  is  called  the  preprocessor.  If  we  want  to  solve  equation  (1)  repeatedly 
with  the  same  coefficients  but  with  different  forcing  functions  and  boundary  conditions,  the 
influence  matrix  needs  to  be  computed  only  once  and  stored  in  the  memory  for  further  use. 

Although  the  EVP  method  is  very  efficient  and  requires  a very  small  amount  of  computer 
memory,  it  is  unstable  for  large  number  of  grid  points  in  the  marching  direction  (Meavancey 
and  Leslie.  1972).  For  computers  with  24  bit  precision,  the  limit  is  about  12  grid  points,  which 
may  be  increased  by  going  to  higher  precision  or  by  using  the  two  directonal  marching  method 
described  by  Hirota  et.  al.  (1970).  Even  with  56  bit  precision  and  two  way  marching  the  limit 
is  extended  to  only  about  40  grid  points. 

3.  Stabilized  Error  Vector  Propagation  Method  SEVP) 

In  the  SEVP  method,  the  integration  region  is  divided  into  blocks,  each  of  which  is  stable 
for  the  EVP  method.  Further,  any  two  consecutive  blocks  have  two  rows  in  common.  The 
division  of  the  integration  region  into  three  blocks  is  shown  in  Figure  2.  As  with  the  EVP 
method.  SEVP  is  divided  into  two  steps  the  preprocessor  step  and  the  solution  step.  In  the 
preprocessor  step.  (2XNBLK.-1)  influence  matrices  are  computed  where  NBLK  is  the  total 


4 


NRL  MEMORANDUM  REPORT  3668 


number  of  blocks  NBLK  of  the  influence  matrices  relate  the  values  of  the  homogeneous  solu- 
tion on  the  second  and  last  rows  of  each  block,  while  the  remaining  (NBLK-1)  matrices  relate 
the  homogeneous  solution  on  the  second  rows  of  consecutive  blocks. 


a.  Preprocessor 


The  first  (M  -2)  unit  vectors  on  the  second  row  of  the  first  block  are  used  to  march  in  j 
direction,  using  equation  (4)  and  zero  boundary  conditions  on  B]  B~  and  BA.  to  obtain 
influence  matrices  for  the  last  two  rows  of  the  block.  If  and  *3  (shown  in  Figure  2) 

represent  the  homogeneous  solution  on  the  second  and  last  two  rows  of  the  first  block,  then 


€2  * £l  Ri.l 
*3  “ el  R 1.2 

where  R j ^ and  R12!  are  the  influence  matrices  fot  the  iast  two  rows  of  the  block. 


(7) 

(81 


Combining  (7)  and  (8)  to  eliminate  we  obtain 

( 3 " (2  R i .1 1 R 1 2 “ f ;S] 

(9) 

The  matrices  R j 1 and  5]  relate  homogeneous  solutions  on  the  second  and  last  rows  of  the 
block  and  on  the  last  two  rows  of  the  block,  respectively. 


To  obtain  the  influence  matrices  for  the  second  block,  we  start  with  the  unit  vectors  on 
the  second  row  of  the  second  block,  and  compute  the  corresponding  values  on  the  first  row  of 
the  block  from  equation  (9)  to  get  the  equivalent  of  a boundary  condition  for  this  block.  The 
homogeneous  solution  on  the  first  row  corresponding  to  the  Aih  unit  vector  on  the  second  is 
the  Aih  row  of  the  matrix.  Sj.  Using  (M  —2  ) unit  vectors  on  the  second  row  with  correspond- 
ing vectors  on  the  first  row  and  zero  boundary  values  on  B$  and  fi4  , we  march  equation  (4)  in 
./  to  obtain  influence  matrices  for  the  last  two  rows  of  the  second  block.  If  the  vectors  «4 

and  « 5 represents  a homogeneous  solution  on  the  second  and  last  two  rows  of  the  block. 
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respectively,  then 


and 


e4  ” 6 3 rM 


(10) 


where  R,  ( and  R22  are  the  influence  matrices  for  the  last  two  rows  of  the  block.  Eliminating 
«,  from  (10)  and  (11)  we  obtain 

= €s;  S-.  St  =*  R 1 2*  R21 

4 - - - (12) 
Repeating  the  procedure  described  above  for  the  third  block  (last  block  in  Figure  2)  we  will 
obtain  a matrix,  R32,  relating  homogeneous  solution  on  the  second  and  last  rows  of  the  block. 
If  and  e0  represents  homogeneous  solution  on  these  rows,  then 


6 5 3,2  (13) 

As  in  the  EVP  method,  all  the  influence  matrices  depend  only  on  the  coefficients  of  equation 


(1). 


b.  Solution 

We  obtain  the  required  solution  by  one  forward  and  one  backward  sweep  ot  the  blocks. 
During  the  forward  sweep,  an  approximate  solution  is  obtained  which  satisfies  the  equation  and 
the  boundary  conditions  everywhere  except  on  Boundary  S2.  In  the  backward  sweep,  this 
solution  is  corrected  to  obtain  the  exact  solution. 

The  forward  sweep  is  started  with  the  second  row  on  the  first  block  with  an  initial  guess 
for  the  elements  of  that  row  and  the  given  boundary  values  along  the  first  row.  This  pro- 
cedure is  exactly  the  same  as  EVP.  When  the  solution  has  been  marched  to  the  end  of  the 
first  block,  arbitrary  values  are  assigned  along  the  block  boundary  to  obtain  the  correction  vec- 
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tor,  €3,  The  second  sweep  forward  generates  a homogeneous  solution  for  the  first  block.  Itera- 
tions are  applied,  if  necessary,  to  reduce  roimd  off  error. 

For  the  second  block  we  take  the  arbitrary  solution  assigned  along  the  first  block  boun- 
dary (second  row  second  block)  and  the  solution  aiong  the  first  row  of  the  second  block  and 
sweep  forward  again.  An  arbitrary  solution  is  again  imposed  along  the  last  row  of  the  second 
block  to  obtain  the  correction  vector  e5  . On  the  second  (or  homogeneous)  sweep  of  the 
second  block  we  use  the  correction  vectors  and  e3  to  form  the  homogeneous  solution.  e3  is 
computed  from  by  equation  (11)  and  €3  is  obtained  from  «3  using  equation  (9).  The  pro- 
cedure continues  until  all  blocks  are  swept  out.  For  the  last  block,  we  use  the  boundary  in- 
stead of  a guess  value  to  find  e6. 

The  particular  solution  we  have  obtained  on  the  forward  sweep  satisfies  equation  (1)  and 
the  given  boundary  conditions  everywhere  except  on  the  block  boundaries.  More  importantly, 
the  last  block  contains  the  exact  solution  since  the  computation  of  the  correction  vector  e6  is 
based  upon  the  known  boundary  conditions  along  S3. 

The  backward  sweep  now  corrects  the  errors  introduced  in  the  particular  solution  by 
guessing  the  solutions  along  the  block  boundaries.  Using  equations  (12)  and  (13)  we  calculate 
€4  and  for  the  homogeneous  sweep  of  the  last  block  and  obtain  the  total  solution  for  the 
last  block  which  has  two  rows  in  common  with  the  second  block.  To  obtain  the  homogeneous 
solution  for  the  second  block  we  need  to  find  the  difference.  e5,  between  the  particular  solu- 
tion on  the  last  row  of  the  block  and  the  exact  solution.  One  method  of  obtaining  e5  is  to  store 
the  particular  solution  from  the  forward  sweep  and  subtract  it  from  the  exact  solution.  Howev- 
er, it  is  much  easier  to  recompute  the  particular  solution  by  backing  up  three  rows  into  the 
second  block  and  repeating  a small  segment  of  the  forward  sweep  After  «5  is  obtained  we 
sweep  the  second  block  to  obtain  the  homogeneous  solution  which  is  then  added  to  the  partic- 


7 


RANGARAO v MADALA 


ular  solution.  This  procedure  is  repeated  until  we  finish  the  blocks.  Error  reduction  iterations 
are  applied  at  this  step  also. 

The  stabilization  of  EVP  by  this  method  is  achieved  by  the  introduction  of  the  artificial 
boundaries.  Propagation  of  error  is  very  severe  in  EVP,  and  when  the  error  exceeds  the  accu- 
racy of  the  computer  it  cannot  be  corrected  by  addition  of  a homogeneous  solution.  The  intro- 
duction of  artificial  boundaries  before  the  error  becomes  too  large  limits  the  error  in  each 
block.  Since  the  influence  matrices  for  small  blocks  have  small  error  levels,  the  artificial  solu- 
tion generated  by  the  introduction  of  these  boundaries  can  be  easily  corrected  to  obtain  an  ex- 
tremely accurate  final  solution. 

4.  Computer  Time  and  Memory  Requirements. 

As  a test  problem  for  SEV'P,  Poisson's  equation  is  solved  over  a square  region  with  zero 
homogeneous  boundary  conditions.  The  test  problem  is  also  solved  with  the  Lindzen-Kuo 
(Lindzen  and  Kuo,  1969),Crout  (1943)  and  SOR  methods.  The  former  two  are  direct  solvers. 
The  relaxation  by  SOR  is  stopped  when  the  normalized  error  (normalized  with  the  forcing 
function)  reaches  10  A comparison  of  the  computing  time  taken  by  these  methods  and 
SEVP  is  given  in  Figure  3.  When  the  grid  dimension  exceeds  60  for  Lindzen-Kuo  (L-K)  and 
50  for  Crout  methods  the  computer  memory  is  exhausted.  It  is  clear  from  the  figure  that  the 
SEVP  method  is  more  efficient  than  the  other  three  methods.  For  M = N =40,  the  SEVP 
method  is  about  two  times  faster  than  L-K  and  Crout  methods  and  20  times  faster  than  SOR. 
The  computation  times  given  in  Figure  3 were  obtained  with  double  precision  (56  bit  preci- 
sion) operations  on  a vector  computer. 

Figure  4 shows  the  computer  time  required  by  the  preprocessor  step  and  the  solution  step 
of  SEVP  on  a vector  computer.  The  ratio  of  computing  time  required  by  the  preprocessor  step 


to  that  required  by  the  solution  step  is  about  3 for  M =N  =10  and  increases  with  increase  in 
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grid  dimension.  The  L-K  method  can  also  be  separated  into  a preprocessor  step  (computing  a 
matrices)  and  a solution  step.  For  the  test  problem  single  precision  on  32  bit  word  computer  is 
adequate  for  the  L-K  method. 

Figure  5 gives  a comparison  of  the  computer  time  taken  by  the  single  precision  Lindzen- 
Kuo  and  double  precision  SEVP  methods  to  solve  the  test  problem.  The  top  two  curves  are 
the  total  computer  time  (preprocessor  and  solution  steps)  required  while  the  lower  two  curves 
are  for  the  solution  steps  only.  It  is  clear  from  the  figure  that  SEVP  method  is  faster  than 
Lindzen-Kuo  method  for  both  the  preprocessor  and  solution  sieps. 

Table  1 gives  the  auxiliary  memory  requirements  for  the  three  direct  methods  for  a 32  bit 
word  computer  using  double  precision.  It  is  clear  from  the  table  that  the  SEVP  method's  auxi- 
liary memory  requirement  is  an  order  of  magnitude  smaller  than  that  required  by  the  other  two 
direct  methods. 

5.  Selection  of  the  Marching  Direction  and  the  Block  Size 

If  NB(k)  represents  the  maximum  number  of  points  in  the  marching  direction  of  the  Arth 
block,  then 

NB(k)^PA 

where  P is  a constant  which  depends  only  upon  the  computer  precision  and  A represents  the 
minimum  value  of  CY(i.j)  CX~Hi,j ) if  we  march  in  j and  of  CX(i.j)  CY~[(i,j)  if  we 
march  in  i.  It  is  clear  from  this  equation  that  we  can  reduce  the  auxiliary  memory  required  by 
the  SEVP  method  by  choosing  the  marching  direction  in  such  a way  that  A > 1.  It  can  also 
be  shown  that  the  method  requires  less  computing  time  to  solve  the  elliptic  equation  in  the 
case  when  A > 1 than  in  cases  A < 1.  Therefore,  the  computing  speed  of  the  method  in- 
creases with  |A|.  On  the  other  hand,  the  Lmdzen-Kuo  and  Crout  methods  deteriorate  with  the 
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increasing  A and  became  unstable  for  A ^ 100. 

6.  Summary  and  Conclusions 

A new  method  called  SEVP  was  developed  to  solve  elliptic  equations  which  has  most  of 
the  advantages  ot  the  EVP  method  and  is  stable  for  any  number  of  grid  points.  In  comparison 
with  Lindzen-Kuo,  Crout  and  SOR  methods  this  method  is  faster  and  requires  about  an  order 
of  magnitude  smaller  computer  memory  than  the  Lindzen-Kuo  and  Crout  methods. 

The  procedure  described  in  Section  3 to  solve  elliptic  equations  with  Dirichelet  boundary 
conditions  can  be  easily  be  extended  to  other  boundary  conditions  such  as  Neumann,  periodic 
or  mixed.  It  can  be  also  be  used  to  solve  elliptic  equations  over  a region  with  irregular  boun- 
daries. 
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Table  I - A Comparison  of  the  Auxiliary  Memory  Requirements 
(in  thousands  of  words)  for  SEVP,  Lindzen-Kuo  and  Crout 
Methods  on  32  Bit  Word  Computer  Using  Double  Precision. 


GRID  SIZE 

LINZEN-K.UO 

10 

4.4 

0.2 

2.8 

20 

33.2 

2.4 

17.6 

30 

111.6 

5.4 

57.6 

40 

262.4 

16.0 

134.4 

50 

510.0** 

260.0 

60 

878.4* 

50.4 

446.4 

70 

1391.6* 

88.2 

705.6* 

80 

2073.6 

115.1 

1049.6* 

90 

2978.4* 

178.2 

1490.4* 

100 

4040.0* 

220.0 

2040.0* 

•More  than  the  total  central  memory  available  on  computer  at  the 
Naval  Research  Laboratory. 

“Normalized  error  is  more  than  10~*. 
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Fig.  2 - A three  block  division  ot'  the  region 
for  the  SEVP  method. 
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- PREPROCESSOR  AND  SOLUTION  (SEVP)' 

PREPROCESSOR  AND  SOLUTION  (L  - K)  , 

SOLUTION  (L-K) 

SOLUTION  (SEVP) 


GRIO  DIMENSION  (M*fSI) 

Fig.  3 - A comparison  of  the  computing  time  requirements 
for  the  SEVP,  Lindzen-Kuo,  Crout  and  SOR  methods. 

'0  ; 


OREPROCESSORISEVP) 

SOLUTION  (SEVP) 


GRIO  OIMENSION(M»N) 


Fig.  4 - Computer  time  taken  by  the  preprocessor  and 
solution  steps  of  the  SEVP  method. 
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Fig-  5 - A comparison  ot  the  computer  time  requirements 
of  singJe  precision  Lindzen-Kuo  and  double  precision  SEVP 
methods  using  a vector  computer  with  32  bit  word  length. 
The  top  curves  give  the  total  rime  taken  by  both  steps: 
the  preprocesser  and  the  solution  steps.  The  lower  two 
curves  give  the  time  taken  by  the  solution  steps. 
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